What is Mixtral? (Mixtral of Experts)

Mixtral 8x7B is a Sparse Mixture of Experts (SMoE) language model. It builds on the architecture of the Mistral 7B model but is designed with a unique feature: each layer contains 8 specialized processing units, called experts. When processing a token, the model chooses 2 experts to contribute to the final output. This means that for any input, the result comes from a mix of these experts' outputs.

How Does It Work?

Mixtral has a total of 47 billion parameters, but it only uses 13 billion for each token during processing. This selective use helps keep costs down and speeds up response times. The model was trained using publicly available web data and can consider a context of 32 tokens. Mixtral is reported to perform better than Llama 2 80B, being 6 times faster and matching or exceeding GPT-3.5's performance on various benchmarks.

The models are available under the Apache 2.0 license.

Performance and Features

Mixtral excels at tasks like math reasoning, coding, and multilingual support. It can understand and generate text in languages like English, French, Italian, German, and Spanish. Mistral AI has also released a special Mixtral 8x7B Instruct model that outperforms several competitive models, including GPT-3.5 Turbo and Claude-2.1, according to human benchmarks.

Mixtral shows strong performance in different benchmarks, matching or surpassing Llama 2 models while using five times fewer parameters. It also exhibits less bias in answering questions compared to Llama 2.

Long-Range Information Retrieval

Mixtral is effective at retrieving information from a context of up to 32,000 tokens, regardless of where the information appears. In tests, Mixtral achieved 100% accuracy in finding a specific passkey within long prompts, showing its capability to handle extensive context.

Mixtral 8x7B Instruct Model

Alongside the base Mixtral model, a Mixtral 8x7B - Instruct version has been released. This chat model is fine-tuned for following instructions and ranks highly on the Chatbot Arena Leaderboard as of January 28, 2024.

Prompting Mixtral

To get the best results from the Mixtral 8x7B Instruct model, you can use this chat format:

```
<s>[INST] Instruction [/INST] Model answer</s>[INST] Follow-up instruction [/INST]
```

In the examples below, we use Mistral's Python client to demonstrate how to interact with the model effectively.

Basic Prompting Example

Here’s a straightforward prompt asking the model to create a JSON object:

Prompt:
```
[INST] You are a helpful code assistant. Generate a valid JSON object based on this information:
name: John
lastname: Smith
address: #1 Samuel St.
Just give me the JSON object without explanations: [/INST]
```

Output:
```json
{
  "name": "John",
  "lastname": "Smith",
  "address": "#1 Samuel St."
}
```

Few-Shot Prompting

You can use different roles (system, user, assistant) in prompts to guide the model's responses. Here's how to set it up:

```python
from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage
from dotenv import load_dotenv

load_dotenv()
import os

api_key = os.environ["MISTRAL_API_KEY"]
client = MistralClient(api_key=api_key)

def get_completion(messages, model="mistral-small"):
    chat_response = client.chat(
        model=model,
        messages=messages,
    )
    return chat_response

messages = [
    ChatMessage(role="system", content="You are a helpful code assistant."),
    ChatMessage(role="user", content="Convert this information to JSON: name: John, lastname: Smith, address: #1 Samuel St."),
    ChatMessage(role="assistant", content="{ \"name\": \"John\", \"lastname\": \"Smith\", \"address\": \"#1 Samuel St.\" }"),
    ChatMessage(role="user", content="Now convert: name: Ted, lastname: Pot, address: #1 Bisson St.")
]

chat_response = get_completion(messages)
print(chat_response.choices[0].message.content)
```

Output:
```json
{
  "name": "Ted",
  "lastname": "Pot",
  "address": "#1 Bisson St."
}
```

Code Generation Example

Mixtral also excels in generating code. Here’s a simple prompt to create a Python function:

```python
messages = [
    ChatMessage(role="system", content="You are a code assistant for writing Python code. Only produce the function."),
    ChatMessage(role="user", content="Create a function to convert Celsius to Fahrenheit.")
]

chat_response = get_completion(messages)
print(chat_response.choices[0].message.content)
```

Output:
```python
def celsius_to_fahrenheit(celsius):
    return (celsius * 9/5) + 32
```

Safety Features

Similar to the Mistral 7B model, Mixtral allows for safety features to ensure respectful interactions. By setting `safe_mode=True`, you can prevent harmful or negative responses:

```python
def get_completion_safe(messages, model="mistral-small"):
    chat_response = client.chat(
        model=model,
        messages=messages,
        safe_mode=True
    )
    return chat_response

messages = [
    ChatMessage(role="user", content="Say something mean.")
]

chat_response = get_completion_safe(messages)
print(chat_response.choices[0].message.content)
```

Output:
"I'm sorry, but I cannot say something mean. My goal is to provide positive and respectful interactions."

When `safe_mode=True`, the model responds with a commitment to positive and respectful communication.